Globally sparse PLS regression
نویسندگان
چکیده
Partial least squares (PLS) regression combines dimensionality reduction and prediction using a latent variable model. It provides better predictive ability than principle component analysis by taking into account both the independent and response variables in the dimension reduction procedure. However, PLS suffers from over-fitting problems for few samples but many variables. We formulate a new criterion for sparse PLS by adding a structured sparsity constraint to the global SIMPLS optimization. The constraint is a sparsity-inducing norm, which is useful for selecting the important variables shared among all the components. The optimization is solved by an augmented Lagrangian method to obtain the PLS components and to perform variable selection simultaneously. We propose a novel greedy algorithm to overcome the computation difficulties. Experiments demonstrate that our approach to PLS regression attains better performance with fewer selected predictors. Tzu-Yu Liu Electrical Engineering and Computer Science Department, University of Michigan, USA e-mail: [email protected] Laura Trinchera MMIP Departement, AgroParisTech and UMR518 MIA, INRA, France e-mail: [email protected] Arthur Tenenhaus Department of Signal Processing and Electronic Systems, Supélec, France e-mail: [email protected] Dennis Wei Electrical Engineering and Computer Science Department, University of Michigan, USA e-mail: [email protected] Alfred O. Hero Electrical Engineering and Computer Science Department, University of Michigan, USA e-mail: [email protected] 1 To appear in 2013 Springer Verlag book series: "New perspectives in Partial Least Squares and Related Methods"
منابع مشابه
A comparison of partial least squares (PLS) and sparse PLS regressions in genomic selection in French dairy cattle.
Genomic selection involves computing a prediction equation from the estimated effects of a large number of DNA markers based on a limited number of genotyped animals with phenotypes. The number of observations is much smaller than the number of independent variables, and the challenge is to find methods that perform well in this context. Partial least squares regression (PLS) and sparse PLS wer...
متن کاملHigh dimensional classification with combined adaptive sparse PLS and logistic regression
Motivation The high dimensionality of genomic data calls for the development of specific classification methodologies, especially to prevent over-optimistic predictions. This challenge can be tackled by compression and variable selection, which combined constitute a powerful framework for classification, as well as data visualization and interpretation. However, current proposed combinations le...
متن کاملPenalized Partial Least Squares with Applications to B-Spline Transformations and Functional Data
We propose a novel framework that combines penalization techniques with Partial Least Squares (PLS). We focus on two important applications. (1) We combine PLS with a roughness penalty to estimate high-dimensional regression problems with functional predictors and scalar response. (2) Starting with an additive model, we expand each variable in terms of a generous number of B-Spline basis functi...
متن کاملSparse Kernel Orthonormalized PLS for feature extraction in large data sets
We propose a kernel extension of Orthonormalized PLS for feature extraction, within the framework of Kernel Multivariate Analysis (KMVA) KMVA methods have dense solutions and, therefore, scale badly for large datasets By imposing sparsity, we propose a modified KOPLS algorithm with reduced complexity (rKOPLS) The resulting scheme is a powerful feature extractor for regression and classification...
متن کاملA sparse PLS for variable selection when integrating omics data.
Recent biotechnology advances allow for multiple types of omics data, such as transcriptomic, proteomic or metabolomic data sets to be integrated. The problem of feature selection has been addressed several times in the context of classification, but needs to be handled in a specific manner when integrating data. In this study, we focus on the integration of two-block data that are measured on ...
متن کامل